Detection of Breast Cancer from Five-View Thermal Images Using Convolutional Neural Networks

Breast cancer is one of the most common forms of cancer. Its aggressive nature coupled with high mortality rates makes this cancer life-threatening; hence early detection gives the patient a greater chance of survival. Currently, the preferred diagnosis method is mammography. However, mammography is expensive and exposes the patient to radiation. A cost-effective and less invasive method known as thermography is gaining popularity. Bearing this in mind, the work aims to initially create machine learning models based on convolutional neural networks using multiple thermal views of the breast to detect breast cancer using the Visual DMR dataset. The performances of these models are then verified with the clinical data. Findings indicate that the addition of clinical data decisions to the model helped increase its performance. After building and testing two models with different architectures, the model used the same architecture for all three views performed best. It performed with an accuracy of 85.4%, which increased to 93.8% after the clinical data decision was added. After the addition of clinical data decisions, the model was able to classify more patients correctly with a specificity of 96.7% and sensitivity of 88.9% when considering sick patients as the positive class. Currently, thermography is among the lesser-known diagnosis methods with only one public dataset. We hope our work will divert more attention to this area.


Introduction
Cancer is among one of the most predominant diseases prevalent today. Among its various types, breast cancer has become quite notable. Breast cancer can affect both males and females. It affects more than 2.3 million women, which accounts for more than 11% of the total cancer cases in the world as stated by GLOBOCAN 2020 [1]. Meanwhile, cases in men are rare, having an incidence rate of 0.5-1% [2].
Breast cancer accounts for 14% of cancers in India. Kerala records the highest cancer rates as compared to the rest of the country. When considering breast cancer specifically in Kerala, it is quite common among women, accounting for 30-35% of the cancer cases. A prevalence rate of 19.8 per 100,000 and 30.5 per 100,000 was observed in rural and urban areas, respectively, as per the iruvananthapuram Cancer Registry [3].
is could be attributed to the increasing elderly population in Kerala, accounting for nearly 20.9% of their population by 2031.
Studies have shown an increment in the occurrence of breast cancer among the younger population as opposed to 25 years ago in India. About 25 years ago, 69% of the affected individuals were above 50. However, trends show that approximately 48% of them are below the age of 50. Breast cancer was found to account for 23-32% of the cancer cases in women by 2012, overtaking cervical cancer as the most prevalent cancer type [4].
Older women (age greater than 65) have a greater chance of having breast cancer when compared with younger women (age less than 35). Hence, many young women do not undergo cancer detection screenings until the age of 40. Due to this, the survival rates among the younger affected population are much lower than what is observed for affected women of ages 65-74. [5] ere are several treatments available today that try to combat the effects of breast cancer. ese include surgery to remove cancerous cells, chemotherapy, and radiation. With rising cases and increasing mortality rates, breast cancer has become a major medical concern within society.
A convolutional neural network is built, which considers five thermal views of the breast along with the patient's clinical data. is then is used for the classification of the patient as healthy or sick. e main contributions of this work include the following: (i) Performance comparison of single-input CNN and multi-input CNN to see which one gives better predictions (ii) Providing experimental proof to show that adding clinical data improves the prediction of the model (iii) Coupling results of a thermography with the other screening techniques can help improve prediction results In this paper, the section organization is as follows. Section 2 takes a closer look at the origin of the problem and the problem statement that is being worked on. An extensive review of breast cancer and the existing work done for its identification is discussed in Section 3. Details regarding the dataset and the data collected are outlined in Section 4. Sections 5 and 6 present the methods for creation of model and data preparation. Results are discussed in Section 7. Conclusion and future plans are mentioned in Section 8.

Origin of Work.
Breast cancer is a prevalent form of cancer in women. Over the years, there has been significant amount of research that incorporates machine learning principles to help predict the existence of cancerous cells in the breast with high accuracy. Further studies have been done in the localization of such regions which helps radiologists study it further and provide the necessary treatments. In [6], the authors have tried machine learning algorithms to predict the presence of cancer using thermal images of the breast. e most common screening technique currently used is mammography which has disadvantages such as exposure to radiation, high costs to get a screening, and discomfort to the patient. Paper [6] is among the first few papers that worked on adding clinical data of the patient to the multi-input convolutional neural network.

Problem Definition.
Detection of breast cancer is a challenging task and can be life-threatening if not detected at an early stage. ere are many tools and technological advances to detect breast cancer. Mammography has become a popular screening technique. However, mammography exposes the patient to radiation and causes discomfort to the patient. ermographic screenings require no contact with the machine and are lower in cost than mammography enabling patients to get screenings more often. e advancement of artificial intelligence technology has enabled a deep neural network approach in aiding medical practitioners in rapid diagnosis [7,8]. e convolutional neural network designed uses multiple thermal views of the breast for each patient from the benchmark dataset. It compares the effect of the addition of clinical information collected of each patient to the model.

Literature Review
Breast cancer can be detected using many methods; some of them are discussed in Table 1. ermography is considered a less invasive and cost-efficient screening technique compared with mammograms [16]. Mammography requires the patient to have contact with some machinery during the process, but thermography requires no contact while screening. Furthermore, there is no exposure to radiation that would otherwise arise if mammography was done. ermography could also be used to diagnose women of all ages with any breast density, hence, making thermography more suitable.
Due to the aggressive nature of breast cancer and its rapid multiplicity, the cells will expend a lot of energy because of the repeated cell divisions. Hence, the presence of lesions on the breast can be detected due to this temperature difference. For this study, images were collected using quantitative thermography. e main reason is because the presence of tiny lesions for the early detection of cancer is significant and such temperature differences should be collected for each pixel to get a complete in-depth view of the patient's breasts. When qualitative thermography is used, an in-depth detailed image of the breast is not obtained, and such images may not be able to correctly differentiate the slight temperature variations which detect the early presence of cancer. A quantitative measurement camera can also provide the temperature variations of such regions with the rest of the body which can help any model in detecting the presence of cancer. Table 2 shows the different approaches of detecting breast cancer using thermal images. e popularity of thermography is relatively quite recent. Earlier thermal images were analyzed by humans making the process strenuous and inaccurate. With the emergence of artificial intelligence (AI) and machine learning (ML) algorithms, thermal images can be intercepted and used in a whole new different way. us, it makes thermography a very powerful method that can overcome the disadvantages of mammograms [28,29].
Currently, there are not many studies that test the efficiency of thermograms in predicting the possibility of breast cancer. Hence, devising a model along these lines could contribute to the ongoing research in this area.

Observations.
Breast cancer has risen to be one of the most widespread cancer types with a significant fatality rate depending on various factors such as age. Early detection is a

MRI scan
It uses radio waves to produce scans of the body. It is used to determine the presence of cancerous cells in the body and to see the tumor's growth region. It does not expose a person to radiation, hence a safer alternative to mammograms. In certain cases, a gadolinium-based dye or contrast material could be injected into the arm to show the images more clearly [10].

Mammograms
It produces an x-ray of the breast used for early detection of breast cancer. During this procedure, the breasts are compressed by compression plates. It is one of the most effective methods for detecting breast cancer when no lump is visible or when the patient would like to scan any specific area that shows certain symptoms associated with the disease. If a woman has a high risk of breast cancer, screening starts at a younger age [11,12].

Ultrasound
It uses sound waves to produce images of the breast. It is not usually employed as a screening tool, but is used along with a mammogram to find the existence of a fluid filled cyst or tumour. ey are similar to thermography where there is no exposure to radiation and are effective for women with dense breasts, pregnant women, and young women (age below 25) [13].
ermography It captures the image of breasts using a device which essentially evaluates the surface temperature of the skin of the breasts. During this screening process, there is no radiation, no contact between the patient and the device, and no breast compression, making it a desirable screening tool. It is based on the principle that we know cancer cells grow and spread to different regions fast due to their high metabolism. As metabolism increases, the temperature at that region also increases which can be used to detect the presence of cancerous cells [14,15]. crucial method to combat this disease. Compared to relatively older women (above 65 years old), adolescents and teenagers have a much lower probability of being diagnosed with breast cancer. is is mostly because breast cancer screening is done after they are 40. ree methods discussed in this report for early detection include mammography, clinical, and self-examination. e convolution neural network implementation has been made to predict breast cancer.
e CNN mechanism classifies the image by breaking it down into its features, reconstructing it, and predicting it at the end. e edge-based samples have been considered to reduce the comparison time and space. is results in increased accuracy.
e reason for picking thermography over mammography is, as mentioned in the report, the boom of AI opened new possibilities, and thermography being a contact-less screening method makes it more preferred over mammography. e idea behind adding clinical data along with the thermal images is that it enhances the accuracy of the model, as exhibited in [6]. Table 3 shows some pros and cons regarding certain other approaches discussed earlier.

Impact of Technology on Breast Cancer Prediction.
Technology has a lot of impact on the early prediction of breast cancer. Image processing is a very important component for its prediction. Some of the techniques used to obtain images are computerized tomography (CT) scan, MRI, mammograms, thermography, and ultrasound. One very important application of using technology is that they can be used to identify and detect regions of growth and segment those regions and radiologists can use this to study cancers and help the patients [34].

Artificial Neural Network.
One of the most important uses of artificial neural networks is their ability to be used in place of mammography and breast MRIs where they can be used to screen severe cases of cancer. ANNs can be used to detect cancer and can be used to find benign tumours as efficiently and accurately as possible [35]. Mammography can be expensive and as the patient is exposed to radiation, mammography cannot be used as a screening tool at regular intervals. Mammography can cause patients some discomfort which is eliminated when using neural networks in the early detection of breast cancer.

Convolutional Neural Network.
Convolutional neural networks have the ability to extract complex features from any image that it processes [36]. ey help us find patterns that are common among different images. e basic operation which is performed here is that, for each cell or pixel in the image matrix, the cell is multiplied by an element in the weight matrix, followed by the addition of the resulting products. Going through a CNN, the first few layers can help us identify edges in the image, and going deeper in the network, extracts more information from them. Moving across the network, we reduce the dimensions of an image, while retaining the important information used to make a prediction. Table 4 shows some CNN models prevalent today.

Mammographic Images Dataset
is dataset consists of only the clinical data generated from around 40,000 mammograms (20,000 digital mammograms and 20,000 film screen mammograms) collected between the years 2005 and 2008 by the Breast Cancer Surveillance Consortium. e clinical data includes mammogram assessment, details on breast cancer diagnosis within one year, age of the patient, family history of breast cancer, biopsy details, breast density, etc. [40]. is dataset was compiled in 2015 and its version is 1.21 [40,41].

Digital Database for Screening Mammography (DDSM).
e dataset of mammograms consists of 2500 studies where each sample consists of two images of the breast along with the patient's clinical information such as breast density; presence of abnormalities is also mentioned in the dataset. e resolution of the images ranged from 42 microns to 50 microns. e images collected were classified as either normal, benign, or malignant. e datasets were compiled by Massachusetts General Hospital, Sandia National Laboratories, and the University of South Florida Computer Science and Engineering Department [42].

4.2.
ermographic Images Dataset. ermography is a detection method that uses an infrared camera to identify heat patterns based on the infrared emissions given off by an individual. Database for Mastology Research (DMR) [43] is an online platform that stores and manages mastologic images for early detection of breast cancer. is database includes the data of the patients from University Hospital Antonio Pedro (HUAP) of the Federal Fluminense University of Brazil. e database contains data from 293 patients. e database includes thermal images (front, lateral views), thermal matrix (where each pixel contains the thermal information), and clinical and personal information (this includes menstrual details, medical history, data acquired as part of protocols, previously diagnosed positive or negative, age, and eating habits) of the patient. ere are two protocols that are used in thermography because the whole point of the thermogram is to note how the body cells react under different temperature conditions. Static protocol: patients will be asked to rest themselves with the help of brackets, whilst frontal and lateral views (including right, left, right oblique, and left oblique views) imagery will be captured. e whole point of this is for the patient to attain thermal stability.

ResNet
Residual networks make use of residual blocks to train very deep neural networks. Making use of these blocks helps prevent the complications that come with training deep networks such as accuracy degradation. e key idea behind these blocks is to have the output of one layer be sent as an input to a layer deeper in the network. is is then added with layer's output in its normal path before the nonlinear function is applied. e variants of ResNet include ResNet-50, ResNet-1202, ResNet-110, ResNet-164, ResNet-101, and ResNet-152 [30].

DenseNet
In the DenseNet architecture, for any layer in the convolutional neural network, the input will be the concatenated result of the output of all the previous layers in the network. By combining information from the previous layers, the computational power can be reduced. rough these connections across all layers, the model is able to retain more information moving across the network improving the overall performance of the network. e variants of DenseNet are DenseNet-B and DenseNet-BC [31].

MobileNet
MobileNet is an architecture that is implemented for mobile devices. ey use depth-wise separable convolutions where the filters are added for individual input channels as opposed to the general CNN models. ese each channel separate and stack the convoluted outputs together. e different variants include MobileNetV1 and MobileNetV2 [32].
ShuffleNet ey are efficient convolutional neural networks which use filter shuffling to reduce computational complexity. ey can reduce the time required to train a model and are used most commonly in smartphones. Shuffling the filters allows more information to pass the network which increases the accuracy of the network with a limited number of resources. e different ShuffleNet models include ShuffleNetV1 and ShuffleNetV2 [33].  [27,39] Journal of Healthcare Engineering 5 Dynamic protocol: once the thermal stability is acquired, the patient will be subjected to cooling using a conditioning fan. Frontal and lateral images are obtained sequentially (5second gap).
Images are recorded using a FLIR SC620 thermal camera which is captured using the static and dynamic protocols. eir dimensions were 640 × 480.

5.1.
Steps for Predicting Breast Cancer Using ermal Images. e dataset was obtained from the Database for Mastology Research (DMR). e thermal images of five different views (front, left 45°, right 45°, left 90°, and right 90°) of the breasts were used. Along with this, the clinical data of each patient was recorded using web scraping.
After the extraction of thermal images and clinical data, we moved on to image and data preprocessing. e images that we collected were of dimensions 640 × 480. Patients that had fuzzy images, images that did not follow protocol, or did not have all five views were removed from the dataset. For data preprocessing, any anomalous entries were deleted as well. In the current implementation, the CNN models that were built and trained worked for square images. Hence, as part of data preprocessing, all the images were transformed to images of size 640 × 640. To transform each image to a square image, the resize functionality provided by PyTorch was used. After this, it passes through various classification methods as shown in Figure 1.

Architecture Diagram.
Once the image and clinical data preprocessing is completed, the focus shifts to building models that could be used for classification. In the current design, a convolutional neural network is built for each of the five views of the breast. Each model is trained separately to make sure that they perform well individually before combining their output. e output of each model will be a tuple containing the probability of the patient being sick and the probability of the patient being healthy. e probability of a patient being diagnosed as healthy is denoted by P(H) and the likelihood of a patient being diagnosed as sick is P(S). Table 5 shows the output that would be generated by each model.
Once the training of the five neural networks is completed, the results will be combined to train the final neural network. e combined result will be a 10-element tuple such that each of the 5 views will contribute the probability of a healthy and sick diagnosis. e set of these tuples for all of the patients who were part of the training data is then used to build and train the final neural network. e resulting output of the final neural network will be the prediction of healthy or sick by the multi-input CNN model. e next step is to add personal and clinical data to the model and see the change in the performance of the model after appending the result of the model that was trained on the clinical data. e clinical data collected includes information about the age, conditions, symptoms, family history, menarche, etc. Once categorization of this dataset is completed, a neural network is built for the clinical data. e output of this neural network will be a value between 0 and 1. A threshold of 0.5 is set to decide if the prediction is healthy or sick.
(i) If P (clinical data) < 0.5, patient diagnosis prediction: healthy (ii) If P (clinical data) > 0.5, patient diagnosis prediction: sick Once this clinical data neural network model is trained, its output will be appended to the output of the multi-image CNN model giving an 11-element tuple which would be passed as input to the final neural network. Both images and clinical data are treated separately and only the outputs from their respective models (i.e., CNN for images and neural network for clinical data) are concatenate to determine the final classification. Different performance evaluation metrics are run on the final neural network to see how the addition of clinical data to the model affects its performance. A detailed overview of the steps followed to obtain the prediction using thermal  images alone and thermal images along with clinical data is shown in Figure 2.

Data Preparation
e images are recorded using a FLIR SC620 thermal camera, which is captured using static and dynamic protocols (to see details of each protocol, refer to the dataset section of the report). e different views captured are shown in Figure 3.

Image Preprocessing.
(i) Identify the inaccurate or incomplete entries, then either try to rectify it or delete the entry. e same is applied to thermal and personal clinical data (ii) All patients that did not have all five views (i.e., front, left 45°, right 45°, left 90°, and right 90°) were removed from the dataset (iii) If the static view of a front or lateral view was not available or was fuzzy, we replaced them with images taken with the dynamic protocol (iv) Images will be removed if they are as follows: (i) Blurry and barely visible (refer to Figure 4(a)) (ii) Presence of injury (refer to Figure 4(b)) (iii) Proper protocols were not followed during the process of data collection (for example, keeping arms down or images taken from a different angle) (refer to Figure 4(c))

Clinical Data Preprocessing.
(i) Since the database was procured from Brazil, the text had to be converted to English (ii) Patient's age was constantly updated in the database which was converted to the age at the time of visit (iii) Some cases such as Patient 398 had their age as 0 and Patient 211 had their age as 120. Such anomalies were removed (iv) All entries which were blank were filled appropriately with values such as "not answered" (v) (Patient's age was constantly updated in the database which was converted to the age at the time of visit (vi) Some cases such as Patient 398 had their age as 0 and Patient 211 had their age as 120. Such anomalies were removed (vii) All blank entries were filled appropriately with values such as "not answered" (viii) e final set of features included the following: Using these layers, each thermal image is gone through the models independently and their output is concatenated to predict the outcome of the disease.

Current Dataset after Preprocessing.
After preprocessing, there are 157 healthy patients and 84 sick patients. is is split into 80% for training set and 20% for test set as shown in Table 6. e 80/20 ratio is the most common combination of splitting training set and test set which can provide the best results. In general, 70-80% is used for training and the remaining 20-30% for testing.

Formulas Used for Model Evaluation.
e sick class is considered positive, while the healthy class is deemed to be negative. Table 7 outlines the metrics used for the evaluation of the models.

Loss, Optimizers, and Normalizing Functions.
For all the models discussed below, a Stochastic Gradient Descent (SGD) optimizer was used. e Adam optimizer was also tried; however, it was noticed that the model was overfitting very quickly and it was negatively affecting the overall diagnosis prediction. When SGD was used with a learning rate of 0.001, the loss reduced after each epoch, and training of the model was stopped after a certain number of epochs before the loss started to flatten. To calculate the loss between the actual value and predicted value, Cross Entropy Loss function was used.
Before the data was passed through the ANN, the data was normalized.
is was done using the MinMaxScaler. e reason the scalar is used is to remove any form of bias before passing it to an ANN so that the decision-making process of the neural network may not be affected by any other external factor. e models were trained batch-wise so that the models could see different sets of inputs to improve its performance. All 3 Views. In the current implementation, the models use 3 views of the    Figure 5 uses convolutional layers with channel sizes of 32, 64, and 128. We use strides with a value of 2 and pooling layers of size 2 × 2 to reduce the matrix size from 640 × 640 × 1 to a 1 × 2 matrix containing the probabilities. In this multi-input CNN, the same convolution neural network architecture was used to train all three views and their outputs were combined. After this, another model that combines the result of the three CNNs and the neural network trained on the clinical data is built. e results of the model were evaluated with the same model after the decision from the clinical data was added.    Table 8.

Model 1: Same Architecture for
e area under the ROC curve is around 0.89 without clinical data and 0.987 with clinical data. As this dataset has a class imbalance and since we are more interested in the prediction of sick patients, we calculate the precision and recall and the curve associated with them. We see that a perfect predictive model would have a curve at (1,1). e precision-recall curve is seen to be tending towards (1,1) indicating that the model is performing well. e AUC score for this precision-recall curve is 0.841 without clinical data and 0.977 with clinical data. e plots for the model with and without clinical can be seen in Figure 6.
To give a better perspective, a blue dotted line is drawn on both graphs to denote a model that has no skill; i.e., it simply outputs a prediction randomly. If the graph of the model, here, the multi-input CNN, is above this line (as is seen by the ROC curve and the precision-recall curve), it signifies a much better performance than the no skill model.

Model 2: Different Architectures for Frontal and 90°Views.
is model also uses the 3 views of the thermographic images of the breast (frontal, left 90°, and right 90°). In the   Research also shows that factors such as age [44] or hormone replacement [45] play a role in the disease. Such data coupled with the thermal images would add value as they give a more extensive idea of the patient and their health. e models created so far have been able to perform with high accuracy after adding the patient's clinical information as seen from their performance. ere are some signs and symptoms that can indicate cancerous cells in the body.   Hence, such clinical information can effectively predict the presence of breast cancer. After collecting the patient's information, it is passed through an ANN. ANNs use backpropagation that helps to identify which features are more important and then use this information to make a decision.
Since the features collected are general to any patient, this information can be used for any other problem. For example, if any other cancer such as lung cancer was being detected, the same ANN can be used to train the model with the patient's diagnosis. e ANN then learns which feature is most important for the problem of lung cancer and uses that information to make a decision. Hence, the addition of clinical data is a very easy and effective tool to diagnose a patient, and thus it can be used for a variety of purposes.
Currently, it was observed that most datasets only collect the images of the different screening of a patient and use it to make a decision. is work aims to encourage more people to start collecting clinical data along with the images and use this information also to make a diagnosis prediction.

CNN.
Generally, the CNN used to classify the frontal images was able to classify both the healthy and sick patients with an accuracy above 0.7. When different CNN models were being trained for the different views, it was noticed that the models were able to classify either healthy samples or sick samples with high accuracy while the other class had an accuracy of 0.6. In such cases, different epochs for the different models were tried and the final model was built after ensuring that they perform their best individually. e performance of the models that handle each view before the combination of their results is shown in Table 10.   Data augmentation techniques are mainly used when the dataset is small to generate more samples for the model to learn better. Multi-view CNNs perform better than singleview CNN. In multi-view CNN, different views of the same object are being used as input, so information and features generated from each image can be pooled together to improve the prediction. And data augmentation techniques have been used to complement or improve the performance of the multi-view CNNs [46].
Quantitative comparison of our work with other methods using thermographic images is shown in Table 11.

Conclusion
From the current work, it is found that the addition of clinical data to the convolutional neural networks increased the ability of the model to classify a patient as healthy or sick correctly. We see an accuracy of 93.8% for the model that uses data over 85.4% for the one that does not. Addition of clinical data helps in strengthening the prediction of breast cancer using CNN models. ese findings could also bring to light the importance of increasing the research in the area of thermography. Increased research could improve the techniques employed for thermography and thus help it to become a standard in breast cancer detection in the future.
Since only a multi-input CNN was used for classification, a comparative study of the performance of a single-input CNN with the current input is planned. Along with this, another possible experimental study is using pretrained CNN models for classification. Some techniques for data preprocessing that could be explored are data augmentation and image segmentation. It is interesting to see how each model interacts with the limited data available.
Data Availability e data that was used to obtain the findings can be found in the publicly available Visual DMR dataset [43].